Authored by: Uvini Wijesinghe
Duration: 10 Weeks
Level: Intermediate
Pre-requisite Skills: Python
Title: Understanding the Relationship Between Tree Characteristics and Bird Populations in Urban Areas
As a: City Ecologist
I want to: Analyze the relationship between tree density, tree diversity, and specific types of trees with bird species richness and abundance in the City of Melbourne.
So that: I can determine if diverse tree populations support a more diverse bird species, indicating a healthier urban environment, and guide strategies for enhancing biodiversity in city landscapes.
Acceptance Criteria:
There are two datasets being used in this analysis. These datasets will include below:
Birds Dataset: This dataset contains detailed survey data for bird species observed across various river and wetland locations in the City of Melbourne. Conducted by Ecology Australia, these surveys were carried out during daylight hours on multiple dates in February and March 2018, focusing on bird species richness at different sites, including the main site at Dynon Road, West Melbourne, and several reference sites with similar habitat characteristics.
Trees Dataset: The City of Melbourne's tree dataset provides comprehensive information on over 70,000 trees, detailing their location, species, and lifespan across different precincts. This dataset supports the city's Urban Forest Strategy and can be explored through an interactive tree map, offering insights into the diversity and life expectancy of Melbourne's urban forest.
# Import packages
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from numpy import nan
import plotly.express as px
import requests
from io import StringIO
import folium
from geopy.distance import geodesic
from scipy.stats import chi2_contingency
#Function to collect data
def collect_data(dataset_id):
base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
#apikey = api_key #use if use datasets API_key permissions
dataset_id = dataset_id
format = 'csv'
url = f'{base_url}{dataset_id}/exports/{format}'
params = {
'select': '*',
'limit': -1, # all records
'lang': 'en',
'timezone': 'UTC',
#'api_key': apikey #use if use datasets API_key permissions
}
# GET request
response = requests.get(url, params=params)
if response.status_code == 200:
# StringIO to read the CSV data
url_content = response.content.decode('utf-8')
dataset = pd.read_csv(StringIO(url_content), delimiter=';')
return dataset
else:
print(f'Request failed with status code {response.status_code}')
# Set dataset_id to query for the API call dataset name
dataset_1_id = 'bird-survey-results-for-areas-in-the-city-of-melbourne-february-and-march-2018'
dataset_2_id = 'trees-with-species-and-dimensions-urban-forest'
# Save datasets
bird_data = collect_data(dataset_1_id)
tree_data = collect_data(dataset_2_id)
# Read the bird csv file
# bird_data = pd.read_csv("bird-survey-results-for-areas-in-the-city-of-melbourne-february-and-march-2018.csv")
bird_data.head(3)
| sighting_date | common_name | scientific_name | sighting_count | victorian_biodiversity_atlas_code | lat | lon | loc1_desc | lat2 | lon2 | loc2_desc | site_name | location_2 | location_1 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2018-03-12 | Australian Magpie | Gymnorhina tibicen | 2 | 10705 | -37.8038 | 144.9118 | Dynon Road Tidal Canal Wildlife Sanctuary Down... | NaN | NaN | NaN | Dynon Road Tidal Canal Wildlife Sanctuary | NaN | -37.8038, 144.9118 |
| 1 | 2018-02-28 | Australian White Ibis | Threskiornis molucca | 141 | 10179 | -37.8038 | 144.9118 | Dynon Road Tidal Canal Wildlife Sanctuary Down... | NaN | NaN | NaN | Dynon Road Tidal Canal Wildlife Sanctuary | NaN | -37.8038, 144.9118 |
| 2 | 2018-03-12 | Australian White Ibis | Threskiornis molucca | 83 | 10179 | -37.8038 | 144.9118 | Dynon Road Tidal Canal Wildlife Sanctuary Down... | NaN | NaN | NaN | Dynon Road Tidal Canal Wildlife Sanctuary | NaN | -37.8038, 144.9118 |
# View info on bird dataset
bird_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 498 entries, 0 to 497 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 sighting_date 498 non-null object 1 common_name 498 non-null object 2 scientific_name 498 non-null object 3 sighting_count 498 non-null int64 4 victorian_biodiversity_atlas_code 498 non-null int64 5 lat 498 non-null float64 6 lon 498 non-null float64 7 loc1_desc 498 non-null object 8 lat2 248 non-null float64 9 lon2 248 non-null float64 10 loc2_desc 248 non-null object 11 site_name 498 non-null object 12 location_2 248 non-null object 13 location_1 498 non-null object dtypes: float64(4), int64(2), object(8) memory usage: 54.6+ KB
# Delete columns with more than 50% of empty values
del bird_data['lat2']
del bird_data['lon2']
del bird_data['loc2_desc']
del bird_data['location_2']
del bird_data['location_1']
# Convert the data type of the Date column into DateTime
bird_data['sighting_date'] = pd.to_datetime(bird_data['sighting_date'])
# View cleaned data
bird_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 498 entries, 0 to 497 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 sighting_date 498 non-null datetime64[ns] 1 common_name 498 non-null object 2 scientific_name 498 non-null object 3 sighting_count 498 non-null int64 4 victorian_biodiversity_atlas_code 498 non-null int64 5 lat 498 non-null float64 6 lon 498 non-null float64 7 loc1_desc 498 non-null object 8 site_name 498 non-null object dtypes: datetime64[ns](1), float64(2), int64(2), object(4) memory usage: 35.1+ KB
# View cleaned data
bird_data.head(3)
| sighting_date | common_name | scientific_name | sighting_count | victorian_biodiversity_atlas_code | lat | lon | loc1_desc | site_name | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2018-03-12 | Australian Magpie | Gymnorhina tibicen | 2 | 10705 | -37.8038 | 144.9118 | Dynon Road Tidal Canal Wildlife Sanctuary Down... | Dynon Road Tidal Canal Wildlife Sanctuary |
| 1 | 2018-02-28 | Australian White Ibis | Threskiornis molucca | 141 | 10179 | -37.8038 | 144.9118 | Dynon Road Tidal Canal Wildlife Sanctuary Down... | Dynon Road Tidal Canal Wildlife Sanctuary |
| 2 | 2018-03-12 | Australian White Ibis | Threskiornis molucca | 83 | 10179 | -37.8038 | 144.9118 | Dynon Road Tidal Canal Wildlife Sanctuary Down... | Dynon Road Tidal Canal Wildlife Sanctuary |
#read the tree csv file
# tree_data = pd.read_csv("trees-with-species-and-dimensions-urban-forest.csv")
tree_data.head(3)
| com_id | common_name | scientific_name | genus | family | diameter_breast_height | year_planted | date_planted | age_description | useful_life_expectency | useful_life_expectency_value | precinct | located_in | uploaddate | coordinatelocation | latitude | longitude | easting | northing | geolocation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1029241 | London Plane | Platanus x acerifolia | Platanus | Platanaceae | 59.0 | 1997 | 1997-12-04 | Mature | 6-10 years (>50% canopy) | 10.0 | NaN | Street | 2021-01-10 | -37.834844802361296, 144.97624052189326 | -37.834845 | 144.976241 | 321912.33 | 5810579.39 | -37.834844802361296, 144.97624052189326 |
| 1 | 1357481 | Cyprus Plane | Platanus orientalis | Platanus | Platanaceae | 8.0 | 2008 | 2008-03-12 | Juvenile | 61+ years | 80.0 | NaN | Park | 2021-01-10 | -37.82112379777012, 144.97204161951672 | -37.821124 | 144.972042 | 321509.73 | 5812093.94 | -37.82112379777012, 144.97204161951672 |
| 2 | 1022615 | Spotted Gum | Corymbia maculata | Corymbia | Myrtaceae | 73.0 | 1997 | 1997-11-10 | Mature | 31-60 years | 60.0 | NaN | Street | 2021-01-10 | -37.800407968829234, 144.9624661325885 | -37.800408 | 144.962466 | 320616.72 | 5814374.35 | -37.800407968829234, 144.9624661325885 |
# View tree info
tree_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 76928 entries, 0 to 76927 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 com_id 76928 non-null int64 1 common_name 76903 non-null object 2 scientific_name 76927 non-null object 3 genus 76927 non-null object 4 family 76927 non-null object 5 diameter_breast_height 24986 non-null float64 6 year_planted 76928 non-null int64 7 date_planted 76928 non-null object 8 age_description 24969 non-null object 9 useful_life_expectency 24969 non-null object 10 useful_life_expectency_value 24969 non-null float64 11 precinct 0 non-null float64 12 located_in 76926 non-null object 13 uploaddate 76928 non-null object 14 coordinatelocation 76928 non-null object 15 latitude 76928 non-null float64 16 longitude 76928 non-null float64 17 easting 76928 non-null float64 18 northing 76928 non-null float64 19 geolocation 76928 non-null object dtypes: float64(7), int64(2), object(11) memory usage: 11.7+ MB
# Delete columns with more than 50% of empty values
del tree_data['diameter_breast_height']
del tree_data['age_description']
del tree_data['useful_life_expectency']
del tree_data['useful_life_expectency_value']
del tree_data['precinct']
# Convert the data type of Date columns into DateTime
tree_data['date_planted'] = pd.to_datetime(tree_data['date_planted'])
tree_data['uploaddate'] = pd.to_datetime(tree_data['uploaddate'])
# View cleaned data
tree_data.head(3)
| com_id | common_name | scientific_name | genus | family | year_planted | date_planted | located_in | uploaddate | coordinatelocation | latitude | longitude | easting | northing | geolocation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1029241 | London Plane | Platanus x acerifolia | Platanus | Platanaceae | 1997 | 1997-12-04 | Street | 2021-01-10 | -37.834844802361296, 144.97624052189326 | -37.834845 | 144.976241 | 321912.33 | 5810579.39 | -37.834844802361296, 144.97624052189326 |
| 1 | 1357481 | Cyprus Plane | Platanus orientalis | Platanus | Platanaceae | 2008 | 2008-03-12 | Park | 2021-01-10 | -37.82112379777012, 144.97204161951672 | -37.821124 | 144.972042 | 321509.73 | 5812093.94 | -37.82112379777012, 144.97204161951672 |
| 2 | 1022615 | Spotted Gum | Corymbia maculata | Corymbia | Myrtaceae | 1997 | 1997-11-10 | Street | 2021-01-10 | -37.800407968829234, 144.9624661325885 | -37.800408 | 144.962466 | 320616.72 | 5814374.35 | -37.800407968829234, 144.9624661325885 |
This section focuses on identifying the top 10 most commonly sighted bird species based on the total number of sightings. The data is grouped by bird species, and the total sightings for each species are calculated. Then, a bar chart is created to visually display these top 10 bird species, showing the total number of times they were seen.
# Group by Common Name and sum the Sighting Count
top_birds = bird_data.groupby('common_name')['sighting_count'].sum().nlargest(10).reset_index()
# Plot the interactive bar chart
fig = px.bar(top_birds,
x='common_name',
y='sighting_count',
title='Top 10 Most Common Birds',
labels={'common_name': 'Bird Species', 'sighting_count': 'Sighting Count'},
hover_data={'sighting_count': True, 'common_name': True},
color_discrete_sequence=['#1f77b4'])
# Show the plot
fig.show()
This section highlights the top 10 locations where birds were most frequently spotted. The data is grouped by site names, and the total number of bird sightings for each location is calculated. A bar chart is then created to visually represent these top 10 locations, showing how many bird sightings occurred at each site.
# Group by site_name and sum the Sighting Count
top_sites = bird_data.groupby('site_name')['sighting_count'].sum().nlargest(10).reset_index()
# Plot the interactive bar chart
fig = px.bar(top_sites,
x='site_name',
y='sighting_count',
title='Top 10 Most Common Sites',
labels={'site_name': 'Site Names', 'sighting_count': 'Sighting Count'},
hover_data={'sighting_count': True, 'site_name': True},
color_discrete_sequence=['#1f77b4'])
# Show the plot
fig.show()
This section focuses on analyzing how sightings of the top 10 most common bird species change over time. First, the top 10 bird species are identified, and the dataset is filtered to include only these species. Then, the data is grouped by date and bird species to track the total number of sightings for each species over time. A line chart is created to visualize these trends, with each bird species represented by a different line.
# Identify the top 10 most common birds
top_birds = bird_data.groupby('common_name')['sighting_count'].sum().nlargest(10).index
# Filter the dataset to include only the top 10 bird species
filtered_df = bird_data[bird_data['common_name'].isin(top_birds)]
# Aggregate data by date and common name
time_series_data = filtered_df.groupby(['sighting_date', 'common_name'])['sighting_count'].sum().reset_index()
# Plot the interactive time series data
fig = px.line(time_series_data,
x='sighting_date',
y='sighting_count',
color='common_name',
title='Time Series of Top 10 Bird Species Sightings',
labels={'sighting_date': 'Date', 'sighting_count': 'Sighting Count', 'common_name': 'Bird Species'},
hover_data={'sighting_count': True, 'sighting_date': True, 'common_name': True})
fig.update_layout(legend_title_text='Bird Species')
# Show the plot
fig.show()
This section focuses on visualizing the most common tree species in the dataset. It starts by counting the number of trees for each species and identifies the top 10 most frequently occurring tree species. A bar chart is then created to display the number of trees for each of these species.
import pandas as pd
import plotly.express as px
# Count the number of trees for each Common Name
tree_counts = tree_data['common_name'].value_counts().nlargest(10).reset_index()
tree_counts.columns = ['common_name', 'Count']
# Plot the bar chart
fig = px.bar(tree_counts,
x='common_name',
y='Count',
title='Number of Trees by Common Name',
labels={'common_name': 'Tree Species', 'Count': 'Number of Trees'},
color_discrete_sequence=['#1f77b4'])
# Show the plot
fig.show()
This section examines the trend of tree plantings over the years. First, it extracts the year from the "Date Planted" column in the dataset to determine when each tree was planted. Then, it groups the data by year and counts how many trees were planted each year. A line graph is created to visualize the number of trees planted annually, showing the changes over time.
# Extract the year from the Date Planted column
tree_data['year_planted'] = pd.to_datetime(tree_data['date_planted']).dt.year
# Group by Year Planted and count the number of trees
yearly_plantings = tree_data.groupby('year_planted').size().reset_index(name='Count')
# Plot the line graph
fig = px.line(yearly_plantings,
x='year_planted',
y='Count',
title='Number of Trees Planted Each Year',
labels={'year_planted': 'Year', 'Count': 'Number of Trees'},
markers=True,
color_discrete_sequence=['#1f77b4'])
fig.show()
This section focuses on analyzing the age distribution of trees in the dataset. It calculates the age of each tree by subtracting the year it was planted from the current year. A histogram is then created to show how the ages of the trees are distributed, with the x-axis representing the age in years and the y-axis showing the frequency of trees in each age range.
# Calculate tree age
tree_data['age'] = pd.Timestamp.now().year - tree_data['year_planted']
# Plot the histogram
fig = px.histogram(tree_data,
x='age',
title='Distribution of Tree Ages',
labels={'age': 'Age (Years)'},
hover_data={'age': True},
color_discrete_sequence=['#1f77b4'])
fig.show()
This section examines where trees are located, distinguishing between streets and parks. It counts the number of trees found in each location type and creates a horizontal bar chart to display these counts.
# Count the number of trees for each Common Name
tree_counts = tree_data['located_in'].value_counts().reset_index()
tree_counts.columns = ['located_in', 'Count']
# Plot the bar chart
fig = px.bar(tree_counts,
x='Count',
y='located_in',
title='Number of Trees Located in Streets and Parks',
labels={'located_in': 'Tree Location', 'Count': 'Number of Trees'},
orientation='h',
color_discrete_sequence=['#1f77b4'])
# Show the plot
fig.show()
This section focuses on mapping the locations of the top 5 most common tree species. It first identifies these species and filters the dataset to include only those trees. A base map is created centered on the average location of these trees. Each of the top 5 tree species is represented on the map with markers in different colors. Each marker shows the location of a tree and includes a tooltip with the tree species' name. This visualization helps to easily see where the most common trees are situated across the area.
# Count the occurrences of each tree species
common_trees = tree_data['common_name'].value_counts()
# Determine the most common trees (top 5)
most_common_trees = common_trees.head(5).index.tolist()
# Filter the dataset to include only these most common tree species
filtered_df = tree_data[tree_data['common_name'].isin(most_common_trees)]
# Create a base map
map_center = [filtered_df['latitude'].mean(), filtered_df['longitude'].mean()]
m = folium.Map(location=map_center, zoom_start=12)
# Create a marker for each tree species with a tooltip with different colours for different trees
colors = [
'red', 'blue', 'green', 'purple', 'orange'
]
# Add the trees to the map
for i, tree in enumerate(most_common_trees):
tree_data_x = filtered_df[filtered_df['common_name'] == tree]
for _, row in tree_data_x.iterrows():
folium.CircleMarker(
location=[row['latitude'], row['longitude']],
radius=5,
color=colors[i],
fill=True,
fill_color=colors[i],
fill_opacity=0.6,
tooltip=folium.Tooltip(f"Tree: {row['common_name']}")
).add_to(m)
# Display the map
m